Bayesian Real-Time Dynamic Programming

نویسندگان

  • Scott Sanner
  • Robby Goetschalckx
  • Kurt Driessens
  • Guy Shani
چکیده

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted, by focusing dynamic programming on the envelope of states reachable from an initial state set. RTDP often provides performance guarantees without visiting the entire state space. Building on RTDP, recent work has sought to improve its efficiency through various optimizations, including maintaining upper and lower bounds to both govern trial termination and prioritize state exploration. In this work, we take a Bayesian perspective on these upper and lower bounds and use a value of perfect information (VPI) analysis to govern trial termination and exploration in a novel algorithm we call VPI-RTDP. VPI-RTDP leads to an improvement over state-of-the-art RTDP methods, empirically yielding up to a three-fold reduction in the amount of time and number of visited states required to achieve comparable policy performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cost Analysis of Acceptance Sampling Models Using Dynamic Programming and Bayesian Inference Considering Inspection Errors

Acceptance Sampling models have been widely applied in companies for the inspection and testing the raw material as well as the final products. A number of lots of the items are produced in a day in the industries so it may be impossible to inspect/test each item in a lot. The acceptance sampling models only provide the guarantee for the producer and consumer that the items in the lots are acco...

متن کامل

Comparison of Kullback-Leibler, Hellinger and LINEX with Quadratic Loss Function in Bayesian Dynamic Linear Models: Forecasting of Real Price of Oil

In this paper we intend to examine the application of Kullback-Leibler, Hellinger and LINEX loss function in Dynamic Linear Model using the real price of oil for 106 years of data from 1913 to 2018 concerning the asymmetric problem in filtering and forecasting. We use DLM form of the basic Hoteling Model under Quadratic loss function, Kullback-Leibler, Hellinger and LINEX trying to address the ...

متن کامل

A DSS-Based Dynamic Programming for Finding Optimal Markets Using Neural Networks and Pricing

One of the substantial challenges in marketing efforts is determining optimal markets, specifically in market segmentation. The problem is more controversial in electronic commerce and electronic marketing. Consumer behaviour is influenced by different factors and thus varies in different time periods. These dynamic impacts lead to the uncertain behaviour of consumers and therefore harden the t...

متن کامل

A Defined Benefit Pension Fund ALM Model through Multistage Stochastic Programming

We consider an asset-liability management (ALM) problem for a defined benefit pension fund (PF). The PF manager is assumed to follow a maximal fund valuation problem facing an extended set of risk factors:  due to the longevity of the    PF members, the inflation affecting salaries in real terms and future incomes, interest rates and market factors affecting jointly the PF liability and asset p...

متن کامل

A New Approach to Distribution Fitting: Decision on Beliefs

We introduce a new approach to distribution fitting, called Decision on Beliefs (DOB). The objective is to identify the probability distribution function (PDF) of a random variable X with the greatest possible confidence. It is known that f X is a member of = { , , }. 1 m S f L f To reach this goal and select X f from this set, we utilize stochastic dynamic programming and formulate this proble...

متن کامل

Extending Evolutionary Programming Methods to the Learning of Dynamic Bayesian Networks

Recent work has shown that for finding static Bayesian network structures, an Evolutionary Programming (EP) approach that exploits the description length of single links is better suited than a standard Genetic Algorithm (GA). We extend this work to find good dynamic Bayesian network structures that can have large time lags. We do this through the use of a new representation of dynamic Bayesian...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009